Method and apparatus for accessing unaligned data

ABSTRACT

A method and apparatus for accessing data from a memory. The method includes masking off a portion of a first memory address, and accessing a first unit of data corresponding to the first memory address. In addition, the method includes adding a predetermined offset to the first memory address to generate a second memory address, and accessing a second unit of data corresponding to the second memory address. Thereafter, a section of the first unit of data is shifted off, and a separate section from the second unit of data is shifted off. Next, the first unit of data and the second unit of data are joined.

FIELD OF THE INVENTION

The present invention relates computers, and, in particular, accessingunaligned data from a memory device.

BACKGROUND OF THE INVENTION

Memory lines are typically divided into cache line boundaries. In theexample illustrated in FIG. 1, the cache lines are sub-divided into8-byte boundaries. If a memory address corresponds to a boundary line,the memory address is considered an “aligned access”. If a memoryaddress does not corresponds to a boundary line it is considered anunaligned access, and can typically take 2.5 times longer to access.

The number of unaligned access is typically high in computerapplications. As a result, the memory latency associated with theunaligned accesses creates a bottleneck effect that limits theperformance of image/video processing and other applications.

SUMMARY OF THE INVENTION

The present invention provides a method and apparatus for accessing datafrom a memory. The method includes masking off a portion of a firstmemory address, and accessing a first unit of data corresponding to thefirst memory address. In addition, the method includes adding apredetermined offset to the first memory address to generate a secondmemory address, and accessing a second unit of data corresponding to thesecond memory address. Thereafter, a section of the first unit of datais shifted off, and a separate section from the second unit of data isshifted off. Next, the first unit of data and the second unit of dataare joined.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates cache lines sub-divided into cache line boundaries.

FIG. 2 illustrates the MOVQ operation used in one embodiment in themethod of the present invention.

FIGS. 3 illustrates the packed bit-wise logical OR operation used in oneembodiment in the method of the present invention.

FIG. 4 illustrates the packed shift right logical operation used in oneembodiment in the method of the present invention.

FIGS. 5a-b illustrate a flow diagram describing the steps of accessingcache-aligned data according to one embodiment.

FIG. 6 illustrates a computer system having a computer readable mediumwith instructions stored thereon according to one embodiment of thepresent invention.

DETAILED DESCRIPTION

A method and apparatus for accessing unaligned or segmented data isdisclosed. In the following description, for purposes of explanation,specific nomenclature is set forth to provide a thorough understandingof the present invention. However, it will be apparent to one skilled inthe art that these specific details are not required in order topractice the present invention. For example, the present invention attimes herein may be described with reference to the Intel architectureand the related instruction set. However, the same techniques can beapplied to other processor architectures and instruction sets.

Packed Data Formats and Related MMX Instructions

In one embodiment, the method and apparatus for accessing unaligneddata, is provided for accessing data units stored in packed data formatsin the memory/cache. For example, in one embodiment, the data is groupedin sixty-four-bit data groups. The packed data can typically be in oneof three formats: a packed byte format, a packed word format, or apacked double word (dword) format. Packed data in a packed byte formatincludes eight separate 8-bit data elements. Packed data in a packedword format includes four separate 16-bit data elements and packed datain a packed dword format includes two separate 32-bit data elements. Thepacked data is typically operated on in multimedia registers 64 bits inlength.

In one embodiment described herein, the packed data is accessed using aroutine consisting, in part, of instructions selected from the Intel MMXinstruction set. In alternative embodiments, other similar packed datainstruction sets could be used without departing from the presentinvention. The Intel MMX instructions that are used in one embodimentare discussed below.

FIG. 2 illustrates an example of the MOVQ instruction. The MOVQinstruction is used to transfer sixty-four data bits, eight packedbytes, to and from the registers. As shown in FIG. 2, packed data 310,having packed bytes 311, 312, 313, 314, 315, 316, 317 and 318 located inmemory are transferred to a register, and stored as data elements 321,322, 323, 324, 325, 326, 327 and 328.

FIG. 3 illustrates the POR operation. In the POR operation a bit-wiselogical OR is performed on packed data sequence 410 and 420. The resultis placed in packed data sequence 430 as illustrated in FIG. 3.

FIG. 4 illustrates the PSRL (Pack Shift Right Logical) instruction. Theinstruction independently shifts each data element of packed datasequence 510 to the right by the scalar shift count. In order to shifteach individual packed word, double word, or the entire packed datasequence, by the shift count, the PSRL instruction is codified as PSRLW,PSRLD, or PSRLQ, respectively. The high-order bits of each element arefilled with zero. The shift count is interpreted as unsigned.

The PSLL (Pack Shift Left Logical) instruction is performed in the samemanner as the PSRL instruction. In the PSLL instruction, each dataelement is independently shifted to the left by scalar shift count.Moreover, the lower order bits of each element are filled with zeros. Inorder to shift each individual packed word, or double word, by the shiftcount, the PSLL instruction is codified as PSLLW and PSLLD.

Method for Accessing Unaligned Data

FIGS. 5a-b illustrate a flow diagram describing the steps for animproved method of accessing unaligned data with memory address thatcorresponds to a unit of data unaligned with a cache-line/quadwordboundary (hereinafter referred to as the “unaligned memory address”).The embodiment described below is provided with reference to accessing acache/memory device with eight byte line boundaries. In alternativeembodiments, the steps below could be modified to access cache deviceshaving different length boundaries.

In step 602, the memory address for a unit of data (e.g., a quadword, 8bytes),which may or may not be aligned with a quadword boundary, isstored in a first register. Subsequent step(s) of the method determinewhether the memory address is aligned with a quadword boundary. If thememory address is not aligned, innovative subsequent steps of the methodaccess the data in a faster manner. In step 604, a copy of the memoryaddress is stored in a second register.

In step 606, an address for the lower quadword boundary to the left ofunaligned unit of data, is generated. In one embodiment, the address forthe lower quadword boundary (aligned memory address) to the left isgenerated by masking off the low order 3 bits of the first copy of theunaligned memory address, stored in the first register. By clearing the3 low order bits of the first copy of the unaligned memory address, ithas the effect of decreasing value of the memory address by up to 7,which corresponds with 7 byte positions to the left. In one embodiment,the bits are masked off by performing a logical AND operation betweenthe first copy of the unaligned memory address and the value 0xfffffff8.

In step 608, all bits but for the low order 3 bits are masked off of thesecond copy of the unaligned memory address, stored in the secondregister. The result indicates how far off the memory address is fromthe quadword boundary to the left. In step 610, it is determined whetherthe results of step 608 equal 000. If the results of step 608 equal 000,the original memory address is aligned with quadword boundary. As aresult, in step 611 the address stored in the first registers is used toaccess a quadword and the steps of the method are completed.

On the other hand, if the results of step 608 do not equal 000, then instep 612, using the aligned memory address stored in the first register,a unit of data is loaded from the cache to a third register (hereinafter referred to as the lower quad word). In an embodiment using MMXinstructions, the MOVQ instruction is used to transfer the low quadwordinto the third register.

In step 614, a value is added to the memory address stored in the firstregister to have the memory address correspond to the next higherquadword boundary to the right. Considering a quadword consists of eightbytes, a value of 8 is added to the memory address stored in the firstregister. In step 616, the modified memory address stored in the firstregister is used to load a unit of data from the memory/cache into afourth register (hereinafter referred to as the higher quaword). In anembodiment using MMX instructions, the MOVQ instruction is used totransfer the data.

Next the lower quadword is to be shifted by a scalar count to eliminatethe unnecessary bytes from the lower quadword (i.e., the bytes betweenlow quadword boundary and the byte that corresponded to the originalmemory address.) Recall that the value stored in the second register(the original memory address with all but the low 3 bits masked off)represents the number of bytes the original memory address was offsetfrom lower quadword boundary. However, in one embodiment using MMXinstructions, the shift count value represents the number of bitpositions to be shifted rather than bytes. As a result, in step 618, thevalue stored in the second register is to be converted to represent thenumber of bits to be shifted/removed, by multiplying the value by 8. Inone embodiment, the value in the second register is shifted to the leftby 3 bit positions (in the case of an embodiment where value are storedin registers with the low order to the right and high order to theleft.)

In step 620, the lower quadword is shifted by a number of bit positionsequal to the converted value stored in the second register, generated instep 618. In one embodiment, the low quadword is right shifted byexecuting a PSLQ, in an embodiment using MMX instructions. Shifted tothe left in an embodiment implementing on Intel architecture becausedata is stored in registers in the Intel architecture with high order tothe left and low order to the right.

In step 622, the number of bytes to be removed from the high quadword isdetermined. In one embodiment, the number of bytes to be removed isdetermined by subtracting from 64 (eight bytes) the converted valuestored in the second register, with the result stored in the secondregister. Thereafter, in step 624, the high quadword is left shifted anumber of bit positions equal to the new value stored in the secondregister. In an alternative embodiment, the shift values for the highquadword and low quadword could be predetermined and loaded into aregister as needed.

In step 626, the shifted low quadword stored in the third register, andthe shifted high quadword stored in the fourth register are joined byperforming a logically OR operation (POR in the case of MMX) between thetwo quadwords. As a result, the original unaligned quadword addressed bythe original unaligned memory address has been obtained, but has beenobtained via two aligned-memory accesses, which is significantly fasterthan performing a single unaligned memory access.

The method for performing the unaligned memory access as described abovecan be provided in applications (e.g., video applications) topotentially increase the performance of the applications by decreasingthe time to perform unaligned memory accesses. Moreover, applicationsthat include the method as described above, can be stored in memory of acomputer system as a set of instructions to be executed. In addition,the instructions to perform the methods as described above couldalternatively be stored on other forms of computer-readable medium,including magnetic and optical disks. For example, method of the presentinvention can be stored on computer-readable mediums, such as magneticdisks or optical disks, that are accessible via a disk drive (orcomputer-readable medium drive), such as the disk drive shown in FIG. 6.

Alternatively, the logic to perform the methods as discussed above,including the method of performing cache-aligned memory access, could beimplemented in additional computer and/or machine readable mediums, suchas discrete hardware components such as large-scale integrated circuits(LSI's), application-specific integrated circuits (ASIC's), firmwaresuch as electrically erasable programmable read-only memory (EEPROM's);and, electrical, optical, acoustical or other forms of propagatedsignals (e.g., carrier waves, infrared signals, digital signals, etc.).

What is claimed is:
 1. A method of accessing data from a memorycomprising: masking off a portion of a first memory address; accessing afirst unit of data corresponding to the first memory address, the firstunit of data aligned with a memory boundary line; adding a predeterminedoffset to the first memory address to generate a second memory address;accessing a second unit of data corresponding to the second memoryaddress, the second unit of data aligned with a memory boundary line;shifting off a section of the first unit of data and shifting off aseparate section from the second unit of data; and joining the firstunit of data and the second unit of data.
 2. The method of claim 1further comprising: prior to masking off a portion of the first memoryaddress, generating a second copy of the first memory address; maskingoff a portion of the second copy of the first memory address.
 3. Themethod of claim 2 further comprising: after masking off a portion of thesecond copy of the first memory address, multiplying the second copy ofthe first memory address by a predetermined factor to generate a firstshift value; and subtracting the first shift value from a predeterminedvalue to generate a second shift value.
 4. The method of claim 3,wherein shifting off a section of the first unit of data includesshifting the first unit of data by an amount equal to said first shiftvalue; and shifting off a section of said second unit of data includesshifting the first unit of data by an amount equal to said second shiftvalue.
 5. The method of claim 4, wherein joining the first unit of dataand the second unit of data includes performing a logical OR operationbetween the first unit of data and the second unit of data.
 6. Themethod of claim 5, wherein said first and second units of data includepacked data units having multiple units of data.
 7. A method ofaccessing data from a memory comprising: masking off a portion of afirst memory address; accessing a first packed data unit correspondingto the first memory address via a MOVQ instruction, the first packeddata unit aligned with a memory boundary line; adding a predeterminedoffset to the first memory address to generate a second memory address;accessing a second packed data unit corresponding to the second memoryaddress via a MOVQ instruction, the second packed data unit aligned witha memory boundary line; shifting off a section of the first packed dataunit and shifting off a separate section from the second unit packeddata unit via a Pack Shift instruction; and joining the first packeddata and the second packed data units via a POR.
 8. A computer-readablemedium having stored thereon a set of instructions, said set ofinstruction for accessing data from a memory, which when executed by aprocessor, cause said processor to perform a method comprising: maskingoff a portion of a first memory address; accessing a first unit of datacorresponding to the first memory address, the first unit of dataaligned with a memory boundary line; adding a predetermined offset tothe first memory address to generate a second memory address; accessinga second unit of data corresponding to the second memory address, thesecond unit of data aligned with a memory boundary line; shifting off asection of the first unit of data and shifting off a separate sectionfrom the second unit of data; and joining the first unit of data and thesecond unit of data.
 9. The computer-readable medium of claim 8 furthercomprising: prior to masking off a portion of the first memory address,generating a second copy of the first memory address; masking off aportion of the second copy of the first memory address.
 10. Thecomputer-readable medium of claim 9 further comprising: after maskingoff a portion of the second copy of the first memory address,multiplying the second copy of the first memory address by apredetermined factor to generate a first shift value; and subtractingthe first shift value from a predetermined value to generate a secondshift value.
 11. The computer-readable medium of claim 10, whereinshifting off a section of the first unit of data includes shifting thefirst unit of data by an amount equal to said first shift value; andshifting off a section of said second unit of data includes shifting thefirst unit of data by an amount equal to said second shift value. 12.The computer-readable medium of claim 11, wherein joining the first unitof data and the second unit of data includes performing a logical ORoperation between the first unit of data and the second unit of data.13. The computer-readable medium of claim 12, wherein said first andsecond units of data include packed data units having multiple units ofdata.